# DPO Reinforcement Learning
Self Biorag 7b Olaph
A fine-tuned version based on Minbyul/selfbiorag-7b-wo-kqa_golden-iter-dpo-step3-filtered, trained using the HuggingFace MedLFQA dataset (excluding kqa_golden) with Direct Preference Optimization (DPO)
Large Language Model
Transformers English

S
dmis-lab
20
3
Noromaid 7B 0.4 DPO
A 7B-parameter large language model co-created by IkariDev and Undi, optimized with DPO training
Large Language Model
Transformers

N
NeverSleep
137
27
Dpopenhermes 7B V2
Apache-2.0
DPOpenHermes 7B v2 is the second RL fine-tuned model based on OpenHermes-2.5-Mistral-7B, utilizing Direct Preference Optimization (DPO) for reinforcement learning with the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets.
Large Language Model
Transformers English

D
openaccess-ai-collective
30
31
14B DPO Alpha
CausalLM/14B-DPO-α is a large-scale causal language model supporting Chinese and English text generation tasks, with outstanding performance in MT-Bench evaluations.
Large Language Model
Transformers Supports Multiple Languages

1
CausalLM
172
118
Featured Recommended AI Models